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We consider a manufacturing plant that purchases raw materials for product assembly and then 
sells the final products to customers. There are M types of raw materials and K types of products, 
and each product uses a certain subset of raw materials for assembly. The plant operates in 
slotted time, and every slot it makes decisions about re-stocking materials and pricing the existing 
products in reaction to (possibly time- varying) material costs and consumer demands. We develop 
a dynamic purchasing and pricing policy that yields time average profit within t of optimality, for 
any given e > 0, with a worst case storage buffer requirement that is 0(l/e). The policy can be 
implemented easily for large M, K, yields fast convergence times, and is robust to non-ergodic 
system dynamics. 
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1. INTRODUCTION 

This paper considers the problem of maximizing time average profit at a product 
assembly plant. The plant manages the purchasing, assembly, and pricing of M 
types of raw materials and K types of products. Specifically, the plant maintains 
a storage buffer for each of the M materials, and can assemble each product from 
some specific combination of materials. The system operates in slotted time with 
normalized slots t G {0,1,2,...}. Every slot, the plant makes decisions about 
purchasing new raw materials and pricing the K products for sale to the consumer. 
This is done in reaction to material costs and consumer demand functions that 
are known on each slot but can change randomly from slot to slot according to a 
stationary process with a possibly unknown probability distribution. 

It is well known that the problem of maximizing time average profit in such a 
system can be treated using dynamic programming and Markov decision theory. 
A textbook example of this approach for a single product (single queue) prob- 
lem is given in [Bcrtsckas 1995], where inventory storage costs are also considered. 
However, such approaches may be prohibitively complex for problems with large 
dimension, as the state space grows exponentially with the number of queues. Fur- 
ther, these techniques require knowledge of the probabilities that govern purchasing 
costs and consumer demand functions. Case studies of multi-dimensional inventory 
control are treated in [Roy et al. 1997] using a lower complexity neuro-dynamic 
programming framework, which approximates the optimal value function used in 
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traditional dynamic programming. Such algorithms fine-tune the parameters of the 
approximation by either offline simulations or online feedback (see also [Bertsekas 
and Tsitsiklis 1996] [Powell 2007]). 

In this paper, we consider a different approach that does not attempt to approxi- 
mate dynamic programming. Our algorithm reacts to the current system state and 
does not require knowledge of the probabilities that affect future states. Under mild 
ergodicity assumptions on the material supply and consumer demand processes, we 
show that the algorithm can push time average profit to within e of optimality, for 
any arbitrarily small value e > 0. This can be achieved by finite storage buffers 
of size cT e /e, where c is a coefficient that is polynomial in K and M, and T e is a 
constant that depends on the "mixing time" of the processes. In the special case 
when these processes are i.i.d. over slots, we have T e = 1 for all e > 0, and so 
the buffers are size 0(\/t). 1 The algorithm can be implemented in real time even 
for problems with large dimension (i.e., large K and M). Thus, our framework 
circumvents the "curse of dimensionality" problems associated with dynamic pro- 
gramming. This is because we are not asking the same question that could be asked 
by dynamic programming approaches: Rather than attempting to maximize profit 
subject to finite storage buffers, we attempt to reach the more difficult target of 
pushing profit arbitrarily close to the maximum that can be achieved in systems 
with infinite buffer space. We can approach this optimality with finite buffers of 
size 0(1/ e), although this may not be the optimal buffer size tradeoff (see [Neely 
2007][Neely 2006b] for tradeoff-optimal algorithms in a communication network). 
A dynamic program might be able to achieve the same profit with smaller buffers, 
but would contend with curse of dimensionality issues. 

Prior work on inventory control with system models similar to our own is found in 
[Aviv and Pazgal 2002] [Benjaafar and ElHafsi 2006] [Plambeck and Ward 2006] and 
references therein. Work in [Aviv and Pazgal 2002] considers a single-dimensional 
inventory problem where a fixed number of products are sold over a finite horizon 
with a constant but unknown customer arrival rate. A set of coupled differential 
equations are derived for the optimal policy using Markov decision theory Work 
in [Benjaafar and ElHafsi 2006] provides structural results for multi-dimensional 
inventory problems with product assembly, again using Markov decision theory, 
and obtains numerical results for a two-dimensional system. A multi-dimensional 
product assembly problem is treated in [Plambeck and Ward 2006] for stochastic 
customer arrivals with fixed and known rates. The complexity issue is treated by 
considering a large volume limit and using results of heavy traffic theory. The 
work in [Plambeck and Ward 2006] also considers joint optimal price decisions, but 
chooses all prices at time zero and holds them constant for all time thereafter. 

Our analysis uses the "drift-plus-penalty" framework of stochastic network op- 
timization developed for queueing networks in [Georgiadis et al. 2006] [Neely et al. 
2005] [Neely 2006a] . Our problem is most similar to the work in [Jiang and Walrand 
2009] , which uses this framework to address processing networks that queue compo- 
nents that must be combined with other components. The work in [Jiang and Wal- 
rand 2009] treats multi-hop networks and maximizes throughput and throughput- 
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utility in these systems using a deficit max-weight algorithm that uses "deficit 
queues" to keep track of the deficit created when a component cannot be processed 
due to a missing part. Our paper does not consider a multi-hop network, but has 
similar challenges when we do not have enough inventory to build a desired product. 
Rather than using deficit queues, we use a different type of Lyapunov function that 
avoids deficits entirely. Our formulation also considers the purchasing and pric- 
ing aspects of the problem, particularly for a manufacturing plant, and considers 
arbitrary (possibly non-ergodic) material supply and consumer demand processes. 

Previous work in [Huang and Neely 2007] uses the drift-plus-penalty framework 
in a related revenue maximization problem for a wireless service provider. In that 
context, a two-price result demonstrates that dynamic pricing must be used to 
maximize time average profit (a single price is often not enough, although two 
prices are sufficient). The problem in this paper can be viewed as the "inverse" of 
the service provider problem, and has an extra constraint that requires the plant 
to maintain enough inventory for a sale to take place. However, a similar two- 
price structure applies here, so that time-varying prices are generally required for 
optimality, even if material costs and consumer demands do not change with time. 
This is a simple phenomenon that often arises when maximizing the expectation of 
a non-concave profit function subject to a limited supply of raw materials. In the 
real world, product providers often use a regular price that applies most of the time, 
with reduced "sale" prices that are offered less frequently. While the incentives for 
two-price behavior in the real world are complex and are often related to product 
expiration dates (which is not part of our mathematical model), two-price (or multi- 
price) behavior can arise even in markets with non-perishable goods. Time varying 
prices also arise in other contexts, such as in the work [Aviv and Pazgal 2002] which 
treats the sale of a fixed amount of items over a finite time horizon. 

It is important to note that the term "dynamic pricing" is often associated with 
the practice of price discrimination between consumers with different demand func- 
tions. It is well known that charging different consumers different prices is tanta- 
lizingly profitable (but often illegal). Our model does not use such price discrimi- 
nation, as it offers the same price to all consumers. However, the revenue earned 
from our time-varying strategy may be indirectly reaping benefits that are sim- 
ilar to those achievable by price discrimination, without the inherent unfairness. 
This is because the aggregate demand function is composed of individual demands 
from consumers with different preferences, which can partially be exploited with a 
time-varying price that operates on two different price regions. 

The outline of this paper is as follows: In the next section we specify the system 
model. The optimal time average profit is characterized in Section 3, where the 
two-price behavior is also noted. Our dynamic control policy is developed in Section 
4 for an i.i.d. model of material cost and consumer demand states. Section 5 treats 
a more general ergodic model, and arbitrary (possibly non-ergodic) processes are 
treated in Section 6. 

2. SYSTEM MODEL 

There are M types of raw materials, and each is stored in a different storage buffer 
at the plant. Define Q m (t) as the (integer) number of type m materials in the plant 
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on slot t. We temporarily assume all storage buffers have infinite space, and later 
we show that our solution can be implemented with finite buffers of size 0(l/e), 
where the e parameter determines a profit-buffer tradeoff. 

Let Q(t) = (Qi(t), . . . , Qm(0) be the vector of queue sizes, also called the inven- 
tory vector. From these materials, the plant can manufacture K types of products. 
Define (3 m k as the (integer) number of type m materials required for creation of a 
single item of product k (for to G {1, . . . , M} and k G {1, . . . , K}). We assume that 
products are assembled quickly, so that a product requested during slot t can be 
assembled on the same slot, provided that there are enough raw materials. 2 Thus, 
the plant must have Q m (t) > (i m k for all to £ {1, . . . , M} in order to sell one prod- 
uct of type k on slot t, and must have twice this amount of materials in order to sell 
two type k products, etc. The simplest example is when each raw material itself 
represents a finished product, which corresponds to the case K = M, f3 mm = 1 for 
all to, f3 m k = for to ^ k. However, our model allows for more complex assembly 
structures, possibly with different products requiring some overlapping materials. 

Every slot t, the plant must decide how many new raw materials to purchase 
and what price it should charge for its products. Let A(t) = (Ai(t), . . . , Am(*)) 
represent the vector of the (integer) number of new raw materials purchased on slot 
t. Let D(t) = (Di(t), . . . , Dxit)) be the vector of the (integer) number of products 
sold on slot t. The queueing dynamics for to G {1, ... , M} are thus: 

f k "1 



Below we describe the pricing decision model that affects product sales D(t), and 
the cost model associated with purchasing decisions A(t). 

2.1 Product Pricing and the Consumer Demand Functions 

For each slot t and each commodity k, the plant must decide if it desires to offer 
commodity k for sale, and, if so, what price it should charge. Let Zk(t) represent a 
binary variable that is 1 if commodity k is offered and is else. Let Pk{t) represent 
the per-unit price for product k on slot t. We assume that prices Pk(t) are chosen 
within a compact set Vk of price options. Thus: 



The sets Vk include only non-negative prices and have a finite maximum price 
Pk,max- For example, the set Vk might represent the interval < p < Pk,max, 
or might represent a discrete set of prices separated by some minimum price unit. 
Let Z(t) = (Zi(t),...,Z*-(i)) and P(t) = (Pi(t), . . . , P K {t)) be vectors of these 
decision variables. 

Let Y(t) represent the consumer demand state for slot t, which represents any 
factors that affect the expected purchasing decisions of consumers on slot t. Let 
D(t) = (-Di(i), . . . , DK(t)) be the resulting demand vector, where Dk(t) represents 
the (integer) amount of type k products that consumers want to buy in reaction 



2 Algorithms that yield similar performance but require products to be assembled one slot before 
they are delivered can be designed based on simple modifications, briefly discussed in Section 4.8. 




(1) 



Pk(t) £ Vk for all products k G {!,..., K} and all slots t 



(2) 
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to the current price Pk(t) and under the current demand state Y(t). Specifically, 
we assume that D k (t) is a random variable that depends on Pk(t) and Y(t), is 
conditionally i.i.d. over all slots with the same Pk(t) and Y(t) values, and satisfies: 

F k (p,y)=E{D k (t)\P k (t)=p,Y(t)=y} \/ P eV k ,yey (3) 

The F k (p,y) function is assumed to be continuous in p £ V for each y £ y. 3 Wc 
assume that the current demand state Y(t) is known to the plant at the beginning 
of slot t, and that the demand function F k (p,y) is also known to the plant. The 
process Y(t) takes values in a finite or countably infinite set y, and is assumed to 
be stationary and ergodic with steady state probabilities ir(y), so that: 

n(y) = Pr[Y(t)=y] Vyey,Vt 

The probabilities ir(y) are not necessarily known to the plant. 

We assume that the maximum demand for each product k £ {1, . . . , K} is deter- 
ministically bounded by a finite integer D ktmax , so that regardless of price P(t) or 
the demand state Y(t), we have: 

< D k {t) < D ktinax for all slots t and all products k 

This boundedness assumption is useful for analysis. Such a finite bound is natural 
in cases when the maximum number of customers is limited on any given slot. The 
bound might also be artificially enforced by the plant due to physical constraints 
that limit the number of orders that can be fulfilled on one slot. Define [i m ,max as 
the resulting maximum demand for raw materials of type to on a given slot: 

K 

^m.max— ^ ^ fimkDk,max (4) 
k=l 

If there is a sufficient amount of raw materials to fulfill all demands in the vector 
D(t), and if Zk{t) = 1 for all k such that D k (t) > (so that product k is offered for 
sale), then the number of products sold is equal to the demand vector: D{t) = D{t). 
We are guaranteed to have enough inventory to meet the demands on slot t if 
Qm(t) > Hm,max for all to £ { 1 , ... , M} . However, there may not always be enough 
inventory to fulfill all demands, in which case we require a scheduling decision that 
decides how many units of each product will be assembled to meet a subset of the 
demands. The value of D(t) = (D\(t), . . . ,Djc(t)) must be chosen as an integer 
vector that satisfies the following scheduling constraints: 

< D fc (t) < Z k (t)D k (t) Vke{l,...,K} (5) 

K 

Qm(t) > PmkDk{t) Vto £ {1, ... , M} (6) 

fe=l 

2.2 Raw Material Purchasing Costs 

Let X(t) represent the raw material supply state on slot t, which contains compo- 
nents that affect the purchase price of new raw materials. Specifically, we assume 

3 This "continuity" is automatically satisfied in the case when V k is a finite set of points. Continuity 
of Fk(p,y) and compactness of Vk ensures that linear functionals of F k (p,y) have well defined 
maximizcrs p £ V k . 
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that X(t) has the form: 

X(t) = [(an (*),..., (*),..., *M(t))] 

where x m (t) is the per-unit price of raw material m on slot t, and s m (t) is the 
maximum amount of raw material m available for sale on slot t. We assume that 
X(t) takes values on some finite or countably infinite set X, and that X(t) is 
stationary and ergodic with probabilities: 

tt(x) = Pr[X(t) = x] Vx G X,Vt 

The w(x) probabilities are not necessarily known to the plant. 

Let c(A(t),X(t)) be the total cost incurred by the plant for purchasing a vector 
A(t) of new materials under the supply state X(t): 

M 

c(A(t),X(t))=J2^m(t)A m (t) (7) 

m— 1 

We assume that A(t) is limited by the constraint A(t) 6 A(X(t)), where A(X(t)) 
is the set of all vectors A(t) = (Ai(t), . . . , A M (t)) such that for all t: 

< A m (t) < (t)} Vme{l,...,M} (8) 

A m (i) is an integer Vm G {1, . . . , M} (9) 

(10) 

where A m!max and c max are finite bounds on the total amount of each raw material 
that can be purchased, and the total cost of these purchases on one slot, respec- 
tively. These finite bounds might arise from the limited supply of raw materials, or 
might be artificially imposed by the plant in order to limit the risk associated with 
investing in new raw materials on any given slot. A simple special case is when 
there is a finite maximum price x m , max for raw material m at any time, and when 
c m ax = Sm=i x m ,max A m ^ max . In this case, the constraint (10) is redundant. 

2.3 The Maximum Profit Objective 

Every slot t, the plant observes the current queue vector Q(t), the current demand 
state Y(t), and the current supply state X(t), and chooses a purchase vector A(t) 6 
A(X(t)) and pricing vectors Z(t), P(t) (with Z k (t) e {0,1} and P k (t) G V k for 
all k G {1, . . . ,K}). The consumers then react by generating a random demand 
vector D(t) with expectations given by (3). The actual number of products filled 
is scheduled by choosing the D(t) vector according to the scheduling constraints 
(5)-(6), and the resulting queueing update is given by (1). 

For each k G {1, . . . , K}, define a k as a fixed (non- negative) cost associated with 
assembling one product of type k. Define a process <j>(t) as follows: 

K 

4>{t)= - c(A(t), X{t)) + Z k {t)D k {t){P k {t) - a k ) (11) 

k=l 

The value of <p(t) represents the total instantaneous profit due to material pur- 
chasing and product sales on slot t, under the assumption that all demands are 
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fulfilled (so that D k (t) = D k (t) for all fc). Define 

4>actuai{t) as the actual instan- 
taneous profit, defined by replacing the D k (t) values in the right hand side of (11) 
with D k (t) values. Note that 4>{t) can be either positive, negative, or zero, as can 

4>actual{t). 

Define time average expectations <f) and (f> actua i as follows: 
1 t-i t-i 
^= A m 7 E E W( T » > ^-w= I™ t E E ( T )> 

t— >oo t t— >oo t 

T=0 T=0 

Every slot t, the plant observes the current queue vector Q(t), the current demand 
state Y(t), and the current supply state X(t), and chooses a purchase vector A(t) G 
-A(X(t)) and pricing vectors Z(t), P(t) (with Z k (t) G {0,1} and P fe (t) G 7>fc for 
all k G {1, • ■ • The consumers then react by generating a random demand 

vector D(t) with expectations given by (3). The actual number of products filled 
is scheduled by choosing the D(t) vector according to the scheduling constraints 
(5)- (6), and the resulting queueing update is given by (1). The goal of the plant is 
to maximize the time average expected profit <j) actua i- For convenience, a table of 
notation is given in Table I. 



Table I. Tabic of Notation 



Notation 




Definition 


X(t) 




Supply state, tt(x) = Pr[X(t) = x] for x d X 


A(t) = (A 1 (t),. 


■ ■,A M (t)) 


Raw material purchase vector for slot t 


c(A(t),X(t)) 




Raw material cost function 


A(X(t)) 




Constraint set for decision variables A(t) 


Y(t) 




Consumer demand state, ir(y) = Pr[Y(t) = y] for y £ y 


Z(t) = (Z 1 (t),.. 


-,z K (t)) 


0/1 sale vector 


P(t) = (p 1 (t),.. 


■ ,p K (t)) 


Price vector, Pf;(t) G Vk 


D(t) = (D 1 (t),. 


..,D K (t)) 


Random demand vector (in reaction to P(t)) 


F k (p, y) 




Demand function, F k (p,y) = E{D k (t) | P k (t) = p,Y(t) = y} 


Q(t) = (Qi(t),. 


-,Q«(t)) 


Queue vector of raw materials in inventory 


Qm,max 




Maximum buffer size of queue m 






Cost incurred by assembly of one product of type k 






Number of m raw materials needed for assembly product type k 


m 




Instantaneous profit variable for slot t (given by (11)) 


nit) = (m(t),.. 


• ,MAf(*)) 


Departure vector for raw materials, £t m (t) = ^^—1 Pmk(t)D k (i) 


£>(t),A(i) 




Actual fulfilled demands and raw materials used for slot t 


4>actual{t) 




Actual instantaneous profit for slot t 



3. CHARACTERIZING MAXIMUM TIME AVERAGE PROFIT 

Assume infinite buffer capacity (so that Q m ,max = 00 for all m G {1, . . . , M}). 
Consider any control algorithm that makes decisions for Z(t), P(t), A(t), and 
also makes scheduling decisions for D(t), according to the system structure as 
described in the previous section. Define <p opt as the maximum time average profit 
over all such algorithms, so that all algorithms must satisfy factual ^ <P° pt i but 
there exist algorithms that can yield profit arbitrarily close to (f> opt . The value 
of 4> opt is determined by the steady state distributions tt(x) and n(y), the cost 
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function c(A(t), X(t)), and the demand functions Fj~(p, y) according to the following 
theorem. 

Theorem 1. (Maximum Time Average Profit) Suppose the initial queue states 
satisfy E {Q m (0)} < oo for all m E {1, . . . , M}. Then under any control algorithm, 
the time average achieved profit satisfies: 



limsupi^E{0 ac w(r)}<«/> opt 

T = 

where 4> opt is the maximum value of the objective function in the following opti- 
mization problem, defined in terms of auxiliary variables c, f, 0(a,x), a m ,fi m (for 
all x e X,a e A(x), me {I,..., M} ): 

Maximize: (f> 

Subject to: <j> = — c + f , a. m > fi m Vrn 

c = 7r(x) 9(a,x)c(a,x) 
xex aeA(x) 

K 

f = ^7r(y)^E{Z fe (t)(P fe (i)-a fe )F fe (P fe (t),y)|F(i) = 2/ } 
vey k=i 

^ 7r(x) ^2 0(a,x)a m Vm 

xeX aeA(x) 
K 

(im = ^7r(y)^/3 rofe E{Z fe (i)F fe (P fe (t),y)|F(i) = 2/ } Vm 
v ey k=i 
< 6(a, x) < 1 Vx G X, a e A(x) 
6{a,x) = 1 \/x e X 

aeA(x) 

P k {t)eV , Z k {t) e {0,1} Vk,t 

where P(t) = (Pi(t), . . . , Pi<:(t)) and Z(t) = (Z\(t), . . . , Zk (t)) are vectors ran- 
domly chosen with a conditional distribution that can be chosen as any distribution 
that depends only on the observed value ofY(t) = y. The expectations in the above 
problem are with respect to the chosen conditional distributions for these decisions. 

Proof. See Appendix A. □ 

In Section 4 we show that algorithms can be designed to achieve a time average 
profit <j> that is within e of the value <j> opt defined in Theorem 1, for any arbitrarily 
small e > 0. Thus, <fP vt represents the optimal time average profit over all possible 
algorithms. 

The variables in Theorem 1 can be interpreted as follows: The variable 6(a, x) 
represents a conditional probability of choosing A(t) — a given that the plant ob- 
serves supply state X(t) = x. The variable c thus represents the time average cost of 
purchasing raw materials under this stationary randomized policy, and the variable 
f represents the time average revenue for selling products. The variables a m and 
fi m represent the time average arrival and departure rates for queue m, respectively. 
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The above theorem thus characterizes <p opt in terms of all possible stationary ran- 
domized control algorithms, that is, all algorithms that make randomized choices 
for A(t) , Z (t) , P (t) according to fixed conditional distributions given the supply 
state X(t) and demand state Y(t). Note that Theorem 1 contains no variables for 
the scheduling decisions for D(t), made subject to (5)-(6). Such scheduling deci- 
sions allow choosing D(t) in reaction to the demands D(t), and hence allow more 
flexibility beyond the choice of the Z(t) and P(t) variables alone (which must be 
chosen before the demands D{t) are observed). That such additional scheduling 
options cannot be exploited to increase time average profit is a consequence of our 
proof of Theorem 1. 

We say that a policy is (X,Y)-only if it chooses P(t), Z(t), A(t) values as a 
stationary and randomized function only of the current observed X(t) and Y(i) 
states. Because the sets Vk are compact and the functions Fk(p,y) are continuous 
in p e Vk for each y e y, it can be shown that the value of <p opt in Theorem 1 can 
be achieved by a particular (X, Y)-only policy, as shown in the following corollary. 

Corollary 1. There exists an (X,Y)-only policy P*(t), Z*(t), A*(t) such 
that: 4 

E {</>*(£)} = <j) opt (12) 

E {A* m (t)} = E {/4(i)} Vm e {1, . • . , M} (13) 

where (f) opt is the optimal time average profit defined in Theorem 1, and where 
E{(j)*(t)} andE{p* m (t)} are given by: 

K 

E{<j>*(t)} = -E{c(A*(t),X(t))} + J2^{Zm(Pk(t)-a k )F k (P^t),y(t))} 

fc=i 

K 

M/CW} = E^ fcE < Z fc(*)^(^fc(*).^(*))} Vme{l,...,M} 

k=i 

where the expectations are with respect to the stationary probability distributions 
ir(x) and ir{y) for X(t) and Y(t), and the (potentially randomized) decisions for 
A*{t),Z*{t),P*(t) that depend on X(t) andY(t). 

3.1 On the Sufficiency of Two Prices 

It can be shown that the (X, Y)-only policy of Corollary 1 can be used to achieve 
time average profit arbitrarily close to optimal as follows: Define a parameter p 
such that < p < 1. Use the (X, Y)-only decisions for P*(t) and A*(t) every 
slot t, but use new decisions Zk(t) = Z%.(t)lk(t), where lfc(i) is an i.i.d. Bernoulli 
process with Pr[lk(t) = 1] = p. It follows that the inequality (13) becomes: 

E{A* m (t)} =EK,(i)} = (l/p)E{p m (t)} 

where p m (t) corresponds to the new decisions Zk(t). It follows that all queues with 
non-zero arrival rates E {A* m (t)} have these rates strictly greater than the expected 
service rates E{p m {f)} ) and so these queues grow to infinity with probability 1. 



4 Note that in (13 ) we have changed the ">" into "=". It is easy to show that doing so in Theorem 
1 does not result in any loss of optimality. 
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It follows that we always have enough material to meet the consumer demands, 
so that D(t) — D(t) and the scheduling decisions (5)-(6) become irrelevant. This 
reduces profit only by a factor 0(1 — p), which can be made arbitrarily small as 

Here we show that the (X, F)-only policy of Corollary 1 can be changed into 
an (X, y)-only policy that randomly chooses between at most two prices for each 
unique product k G {1, . . . , K} and each unique demand state Y(t) G y, while still 
satisfying (12)-(13). This result is based on a similar two-price theorem derived in 
[Huang and Neely 2007] for the case of a service provider with a single queue. We 
extend the result here to the case of a product provider with multiple queues. 

Theorem 2. Suppose there exists an (X,Y)-only algorithm that allocates Z(t) 
and P(t) to yield (for some given values f and fi m for m G {1, . . . , M} ): 

E k =i®{Z k (t)(P k (t)-a k )F k (P k (t),Y(t))}>r (14) 

Y,k=iPrnk^{Z k (t)F k {P k {t),Y{t))} <fi m for all m £ {I, ... , M} (15) 

Then the same inequality constraints can be achieved by a new stationary ran- 
domized policy Z*(t), P*(t) that uses at most two prices for each unique product 
k G {1, . . . , K} and each unique demand state Y(t) G y. 

Proof. The proof is given in Appendix B. □ 

The expectation on the left hand side of (14) represents the expected revenue 
generated from sales under the original (X, Y)-only policy, and the expectation on 
the left hand side of (15) represents the expected departures from queue Q m (t) 
under this policy. The theorem says that the pricing part of the (X, Y)-only al- 
gorithm, which potentially uses many different price options, can be changed to a 
2-price algorithm without decreasing revenue or increasing demand for materials. 

Simple examples can be given to show that two prices are often necessary to 
achieve maximum time average profit, even when user demand functions are the 
same for all slots (sec [Huang and Neely 2007] for a simple example for the related 
service- provider problem). We emphasize that the (X, F)-only policy of Corollary 
1 is not necessarily practical, as implementation would require full knowledge of the 
w(x) and n(y) distributions, and it would require a solution to the (very complex) 
optimization problem of Theorem 1 even if the w(x) and ir(y) distributions were 
known. Further, it relies on having an infinite buffer capacity (so that Q m ,max = oo 
for all m G {1, . . . , Af }). A more practical algorithm is developed in the next section 
that overcomes these difficulties. 

4. A DYNAMIC PRICING AND PURCHASING ALGORITHM 

Here we construct a dynamic algorithm that makes purchasing and pricing decisions 
in reaction to the current queue sizes and the observed X(t) and Y(t) states, without 
knowledge of the tt(x) and ir(y) probabilities that govern the evolution of these 
states. We begin with the assumption that X(t) is i.i.d. over slots with probabilities 
tt(x) = Pr[X(t) = x], and Y(t) is i.i.d. over slots with n(y) = Pr[Y(t) = y\. This 
assumption is extended to more general non-i.i.d. processes in Sections 5 and 6. 

Define l k {t) as an indicator variable that is 1 if and only if Q m {t) < Pm,max for 
some queue m such that (3 mk > (so that type m raw material is used to create 
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product k): 

. . _ J 1 if Q m (t) < Hm.max for some m such that fi mk > , . 
ik(t) ~ \ otherwise U ° J 

To begin, let us choose an algorithm from the restricted class of algorithms that 
choose Z(t) values to satisfy the following edge constraint every slot t: 

For all k G {1, ... , K}, we have: Z fe (t) = whenever l k (t) = 1 (17) 

That is, the edge constraint (17) ensures that no type k product is sold unless 
inventory in all of its corresponding raw material queues m is at least ^ m ,max ■ Under 
this restriction, we always have enough raw material for any generated demand 
vector, and so D k {t) = Dk(t) for all products k and all slots t. Thus, from (11) wc 
have 4>actuai(t) = 4>{t)- Define /i m (i) as the number of material queue m departures 
on slot t: 

K 

H m {t)^Y,P™ k Z k (t)D k {t) (18) 
k=i 

The queueing dynamics of (1) thus become: 

Qm(t+l) = Q m (t)- f i m (t)+A m (t) (19) 

The above equation continues to assume we have infinite buffer space, but we soon 
show that we need only a finite buffer to implement our solution. 

4.1 Lyapunov Drift 

For a given set of non-negative parameters {9 m } for m 6 {1, ... , M}, define the 
non-negative Lyapunov function L(Q(t)) as follows: 

M 

L(Q(t))A-J2(Qrn(t)-e m ) 2 (20) 
m— 1 

This Lyapunov function is similar to that used for stock trading problems in [Neely 
2009] , and has the flavor of keeping queue backlog near a non-zero value m , as in 
[Neely 2007]. Define the conditional Lyapunov drift A(Q(t)) as follows: 5 

A(Q(i))AE {L(Q(t + 1)) - L{Q{t)) \ Q{t)} (21) 

Define a constant V > 0, to be used to affect the revenue-storage tradeoff. Using 
the stochastic optimization technique of [Georgiadis et al. 2006], our approach is 
to design a strategy that, every slot t, observes current system conditions Q(t), 
X(t), Y(t) and makes pricing and purchasing decisions to minimize a bound on the 
following "drift-plus-penalty" expression: 

A(Q(t))-VE{0(i)| Q(t)} 
where <fr(t) is the instantaneous profit function defined in (11). 



5 Strictly speaking, we should use the notation A(Q(t),t), as the drift may be non-stationary. 
However, we use the simpler notation A(Q(t)) as a formal representation of the right hand side 
of (21). 
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4.2 Computing the Drift 

We have the following lemma. 

Lemma 1. (Drift Computation) Under any algorithm that satisfies the edge con- 
straint (17), and for any constants V > ; 6 m > for m € {1, . . . , M}, the Lya- 
punov drift A(Q(t)) satisfies: 

A(Q(t))-VE{cf>(t)\Q(t)} < B-VE{<f>(t)\Q(t)} 

M 

+ (Qrn(t) - e m )E{A m (t) - Mm (i) | g(t)K22) 

m— 1 

where the constant B is defined: 

1 M 

max [^,moH ^m,max\ (^3) 

m— 1 

PROOF. The edge constraint (17) ensures that the dynamics (19) hold for all t. 
By squaring (19) we have: 

(Qm{t+l)-6 m f = (Q m (t)-e m ) 2 + (A m (t)-^ rn (t)) 2 +2(Q m (t)-9 m )(A m (t)-^ m (t^ 

Dividing by 2, summing over m £ {1, . . . , M}, and taking conditional expectations 
yields: 

M 

A(Q(t)) =E{B(t)|g(t)} + 2(g ro (t) - e m )E{A m {t) - n m {t)\Q{t)} 

m—l 

where B{t) is defined: 

M 

B(t)±-Y,{A m {t)-^ m (t)) 2 (24) 

m— 1 

By the finite bounds on A m (t) and /i m (£), we clearly have £?(t) < B for all i. □ 
Now note from the definition of \i m (t) in (18) that: 

E{f, m (t)\Q(t)} = E\j2(3 m kZ k (t)D k (t)\Q(t)\ 



K 



J2f3 mk E{E{Z k (t)D k (t)\ Q(t),P k (t),Y(t)}\ Q(t)} 



fc=i 
if 



- ^/3 mfc E{Z fc (t)F fc (P fe (t),y(t)) | g(t)} (25) 



fe=l 



where we have used the law of iterated expectations in the final equality. Similarly, 
we have from the definition of <j){t) in (11): 

E{4>{t)\Q{t)} = -E{c(A(t),X(t))\Q(t)} 

K 

+ J2^{Zk(t)(P k (t) - a k )F k (P k (t),Y(t))\ Q(t)} (26) 
fe=i 
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Plugging (25) and (26) into (22) yields: 
A(Q(t))-VE{(f>(t)\Q(t)} < B 

M ( K 

+ J2 ^mW - °rn)E I A m (t) - ^ f3 mk Z k (t)F k (P k (t) , Y (t)) | Q(t) 
m=l V k=l 

+VE{c(A(t),X(t))\Q(t)} 

-VE^J2 z k(t)(Pk(t)-a k )F k (P k (t),Y(t))\ Q(t)| (27) 

In particular, the right hand side of (27) is identical to the right hand side of (22). 
4.3 The Dynamic Purchasing and Pricing Algorithm 

Minimizing the right hand side of the (27) over all feasible choices of A(t), Z(t), 
P(t) (given the observed X{t) and Y(t) states and the observed queue values Q(t)) 
yields the following algorithm: 

Joint Purchasing and Pricing Algorithm (JPP): Every slot t, perform the fol- 
lowing actions: 

(1) Purchasing: Observe Q(t) and X(t), and choose A(t) = (Ai(t), . . . , Ajvf(i)) as 
the solution of the following optimization problem defined for slot t: 

Minimize: Vc(A(t), X(t)) + YZ=i A m {t){Q m {t) - m ) (28) 
Subject to: A(t) e A{X{t)) (29) 

where A{X(t)) is defined by constraints (8)-(10). 

(2) Pricing: Observe Q(t) and Y(t). For each product k e {1, . . . , K}, if l fe (i) = 1, 
choose Z k (t) — and do not offer product k for sale. If l k (t) — 0, choose P k (t) 
as the solution to the following problem: 

Maximize: V(P k (t) - a k )F k (P k (t), Y{t)) 

M 

+F k {P k {t),Y{t)) Pmk(Qm(t) - 9 m ) (30) 

m— 1 

Subject to: P k (t) G V k (31) 

If the above maximization is positive, set Z k {t) = 1 and keep P k (t) as the above 
value. Else, set Z k (t) = and do not offer product k for sale. 

(3) Queue Update: Fulfill all demands D k (t), and update the queues Q m (t) accord- 
ing to (19) (noting by construction of this algorithm that D(t) = D(t) for all 
t, so that the dynamics (19) are equivalent to (1)). 

The above JPP algorithm does not require knowledge of probability distributions 
ir(x) or ir(y), and is decoupled into separate policies for pricing and purchasing. The 
pricing policy is quite simple and involves maximizing a (possibly non-concave) 
function of one variable P k (t) over the 1-dimensional set V k . For example, if V k 
is a discrete set of 1000 price options, this involves evaluating the function over 
each option and choosing the maximizing price. The purchasing policy is more 
complex, as A(t) is an integer vector that must satisfy (8)-(10). This is a knapsack 
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problem due to the constraint (10). However, the decision is trivial in the case when 
c m ax = Sm=i x m, ma xA m ^ max , in which case the constraint (10) is redundant. 

4.4 Deterministic Queue Bounds 

We have the following simple lemma that shows the above policy can be imple- 
mented on a finite buffer system. 

Lemma 2. (Finite Buffer Implementation) If initial inventory satisfies Q m (0) < 
Q m ,max for all to G {1, . . . , M}, where Q m ,max=6 m + A m , max , then JPP yields 
Qm(t) < Qm.max for all slots t >0 and all queues to e {1, ... , M}. 

PROOF. Fix a queue to, and suppose that Q m (t) < Q m ,max for some slot t (this 
holds by assumption at t = 0). We show that Q m (t + 1) < Q m ,max- To see this, 
note that because the function c(A(t),X(t)) in (7) is non-decreasing in every entry 
of A{t) for each X(f) e X, the purchasing policy (28)-(29) yields A m (f) = for 
any queue to that satisfies Q m (t) > 6 m . It follows that Q m (t) cannot increase if 
it is greater than 6 m , and so Qm{t ~t~ 1) ^ @m A mrnax (because A mrnax is the 
maximum amount of increase for queue to on any slot). □ 

The following important related lemma shows that queue sizes Q m (t) are always 
above ^ m ,max, provided that they start with at least this much raw material and 
that the 9 m values are chosen to be sufficiently large. Specifically, define 9 m as 
follows: 



max 

{k£{l,...,K}\f3 mk >0} 



.max - oik) . \ ^ PikAi 

.max . 

Pmk f , . , Pmk 



m.max 



(32) 

Lemma 3. Suppose that 8 m is defined by (32), and that Q m (0) > fi m ,max fo r all 
to e {1, ... , M}. Then for all slots t > we have: 

f^m,max Vme {1,...,M} 

PROOF. Fix to e {1, ... , M}, and suppose that Q m (t) > [i m ,max for some slot 
t (this holds by assumption for t = 0). We prove it also holds for slot t + 1. 
If Q m (t) > ^[i. m , m ax, then Q m (t + 1) > Li m ,max (because at most fi m ,max units 
can depart queue to on any slot), and hence we are done. Suppose now that 
Hm,max < Qm(t) < 2fi m ^ max . In this case the pricing functional in (30) satisfies the 
following for any product k such that (3 m k > 0: 

M 

V(P k (t) - a k )F k (Pk(t), Y(t)) + F k (Pk(t),Y(t)) ^ ik (Qi(t) - 6f) 

i=l 

< F k (P k (t),Y(t)) x 

"t~ PmkiP'f^m^max @m 

i£{l,...M},i^m 

< (34) 

where in (33) we have used the fact that (Qi(t) — 6i)< A itmax for all queues i (and 
in particular for all i ^ to) by Lemma 2. In (34) we have used the definition of 9 m 
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in (32). It follows that the pricing rule (30)-(31) sets Zk(t) = for all products k 
that use raw material m, and so no departures can take place from Q m (t) on the 
current slot. Thus: [i m ,max < Qm(t) < Qm(t + 1), and we are done. □ 

4.5 Performance Analysis of JPP 

Theorem 3. (Performance of JPP) Suppose that 9 m is defined by (32) for all 

m G {1, . . . , M } ? and that firn.max 1^1 Qm(Q) ^ ^m^-^m.max 

for all me {1,...,M}. 

Suppose X(t) and Y(t) are i.i.d. over slots. Then under the JPP algorithm imple- 
mented with any parameter V > 0, we have: 

(a) For all m G {1, . . . , M} and all slots t > 0: 

l^m,max — Qmif) — Qm.max—^m.max A rnmax 

where Q m ,max = 0(V). 

(b) For all slots t > we have: 

iEEfWOJi**-?-™ (35) 

T=0 

where the constant B is defined in (23) and is independent ofV, and where (f) opt is 
the optimal time average profit defined in Theorem 1. 

(c) The time average profit converges with probability 1, and satisfies: 

1 B 

lim - V* (pactuaiir) > <p opt - — (with probability 1) (36) 

t— >oo t ' V 

T = 

Thus, the time average profit is within 0(1 /V) of optimal, and hence can be 
pushed arbitrarily close to optimal by increasing V, with a tradeoff in the maximum 
buffer size that is 0(V). Defining e = B/V yields the desired [O(e), 0(1/ e)] profit- 
buffer size tradeoff. 

Proof. (Theorem 3 parts (a) and (b)) Part (a) follows immediately from Lem- 
mas 2 and 3. To prove part (b), note that JPP observes Q(t) and makes control 
decisions for Z k (t), A(t), P k (t) that minimize the right hand side of (22) under any 
alternative choices. Thus: 

A(Q(t))-VE{4>(t)\Q(t)} < B-VE{p{t)\Q{t)} 

M 

+ (Qm(t) - e m )E{A* m (t) - ,4(t)|Q(t)X37) 

m— 1 

where E {cj>* (t)\Q(t)}, and E {A i m(i)|Q(i)} correspond to any alternative choices for 
the decision variables Z^(t), P k (t), A* m (t) subject to the same constraints, being 
that P£(t) G V k , A* m (t) satisfies (8)-(10), and Z%(t) G {0,1}, and Z* k (t) = 
whenever lk(t) = 1. Because Q m (t) > fJb m ,max f° r all m , w e have lk(t) — for all 
k G {1,...,K} (where lk(t) is defined in (16)). Thus, the (X, F)-only policy of 
Corollary 1 satisfies the desired constraints. Further, this policy makes decisions 
based only on (X(t),Y(t)), which are i.i.d. over slots and hence independent of the 
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current queue state Q(t). Thus, from (12)-(13) we have: 

E{cb*(t)\Q(t)} = E{cb*(t)} = <b opt 
E{A* m (t)-»* m (t)\Q(t)} = E{^(t)-/C(t)} = OVme{l,...,M} 

Plugging the above two identities into the right hand side of (37) yields: 

A(Q(i)) - VE{(j)(t) | Q(t)} < B — V<p opt (38) 

Taking expectations of the above and using the definition of A(Q(t)) yields: 

E {L(Q(t + 1))} - E {L(Q(t))} - VE {cj>(t)} < B — V<f pt 

The above holds for all slots t > 0. Summing over r G {0, 1, . . . , t — 1} for some 
integer t > yields: 

t-i 

E {L(Q(i))} - E {L(Q(0))} W(r)} <Bt- W pt 

T = 

Dividing by tV, rearranging terms, and using non-negativity of L(Q(t)) yields: 

Because = for all k and all r, we have 0(r) = 4> a ctual{T~) for all r, and we 

are done. □ 



PROOF. (Theorem 3 part (c)) Taking a limit of (39) proves that: 

t-i 

t— >oc t 



1 t_1 

liminf-V E{0(r)} > (j) opt — B/V 



r=0 

The above result made no assumption on the initial distribution of Q(0). Thus, 
letting Q(0) be any particular initial state, we have: 

t-i 

liminf-V E{<P(t)\Q(0)} > (f> opt - B/V (40) 

t-s-oo t — ' 

T=0 

However, under the JPP algorithm the process Q(t) is a discrete time Markov chain 
with finite state space (because it is an integer valued vector with finite bounds given 
in part (a)). Suppose now the initial condition Q(0) is a recurrent state. It follows 
that the time average of <j>(t) must converge to a well defined constant cf>(Q(0)) 
with probability 1 (where the constant may depend on the initial recurrent state 
Q(0) that is chosen). Further, because </>(r) is bounded above and below by finite 
constants for all t, by the Lebesgue dominated convergence theorem we have: 



hm l^E{0(r)|Q(O)} = ^(Q(O)) 

T = 

Using this in (40) yields: 

0(Q(o)) > r pt - b/v 
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This is true for all initial states that are recurrent. If Q(0) is a transient state, then 
Q(t) eventually reaches a recurrent state and hence achieves a time average that 
is, with probability 1, greater than or equal to (f> opt — B/V. □ 

4.6 Place-Holder Values 

Theorem 3 seems to require the initial queue values to satisfy Q m {t) > fJ. m .max f° r 
all t. This suggests that we need to purchase that many raw materials before start- 
up. Here we use the place holder backlog technique of [Neely and Urgaonkar 2008] 
to show that we can achieve the same performance without this initial start-up cost, 
without loss of optimality. The technique also allows us to reduce our maximum 
buffer size requirement Q m ,max by an amount ^ m ,max for all to € {1, ... , M }, with 
no loss in performance. 

To do this, we start the system off with exactly Hm,max units of fake raw material 
in each queue to G {1, . . . , M}. Let Q m (t) be the total raw material in queue to, 
including both the actual and fake material. Let Q^ tual (t) be the amount of actual 
raw material in queue to. Then for slot we have: 

Qm{0) = Q? n CW (0) + fl^max VTO € {1, ... , M} 

Assume that /J, m ,max < Q m (0) < d m + A m . max for all to G {1, ... , M}, as needed 
for Theorem 3. Thus, < Q^ tual (0) < 9 m + A m , max — n m ,max (so that the actual 
initial condition can be 0). We run the JPP algorithm as before, using the Q(t) 
values (not the actual queue values). However, if we are ever asked to assemble a 
product, we use actual raw materials whenever possible. The only problem comes 
if we are asked to assemble a product for which there are not enough actual raw 
materials available. However, we know from Theorem 3 that the queue value Q m (t) 
never decreases below [i m ^ max for any to G {1, ... , M}. It follows that we are never 
asked to use any of our fake raw material. Therefore, the fake raw material stays 
untouched in each queue for all time, and we have: 

Q m {t)=Q a r? Ual {t)+»m,max VTO G {1, . . . , M}, W > (41) 

The sample path of the system is equivalent to a sample path that does not use 
fake raw material, but starts the system in the non-zero initial condition Q(0). 
Hence, the resulting profit achieved is the same. Because the limiting time average 
profit does not depend on the initial condition, the time average profit it still at 
least (f) opt — B/V. However, by (41) the actual amount of raw material held is 
reduced by exactly fi m ,max on each slot, which also reduces the maximum buffer 
size Qm.max by exactly this amount. 

4.7 Demand-Blind Pricing 

As in [Huang and Neely 2007] for the service provider problem, here we consider 
the special case when the demand function F k (P k (t), Y(t)) has the form: 

F k (P k (t),Y(t)) = h k (Y(t))F k (P k (t)) (42) 

for some non-negative functions h k (Y(t)) and F k (P k (t)). This holds, for exam- 
ple, when Y(t) represents the integer number of customers at time t, and F k (p) 
is the expected demand at price p for each customer, so that F k (P k (t),Y(t)) = 
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Y{t)F k {P k (t)). Under the structure (42), the JPP pricing algorithm (30) reduces 
to choosing Pk(t) E V k to maximize: 

M 

V{P k {t) - a k )F k (P k (t)) + F k {P k (tj) Pmk(Qm(t) - 9 m ) 

m— 1 

Thus, the pricing can be done without knowledge of the demand state Y(t). 

4.8 Extension to 1-slot Assembly Delay 

Consider now the modified situation where products require one slot for assembly, 
but where consumers still require a product to be provided on the same slot in which 
it is purchased. This can easily be accommodated by maintaining an additional set 
of K product queues for storing finished products. Specifically, each product queue 
k is initialized with D k ^ max units of finished products. The plant also keeps the 
same material queues Q{t) as before, and makes all control decisions exactly as 
before (ignoring the product queues), so that every sample path of queue levels 
and control variables is the same as before. However, when new products are 
purchased, the consumers do not wait for assembly, but take the corresponding 
amount of products out of the product queues. This exact amount is replenished 
when the new products complete their assembly at the end of the slot. Thus, at the 
beginning of every slot there are always exactly D k ^ max units of type k products in 
the product queues. The total profit is then the same as before, with the exception 
that the plant incurs a fixed startup cost associated with initializing all product 
queues with a full amount of finished products. 

4.9 Extension to Price-Vector Based Demands 

Suppose that the demand function F k (P k (t),Y(t)) associated with product k is 
changed to a function F k (P(t), Z(t),Y(t)) that depends on the full price vectors 
Z(t) and P(t). This does not significantly change the analysis of the Maximum 
Profit Theorem (Theorem 1) or of the dynamic control policy of this section. Specif- 
ically, the maximum time average profit 4> opt is still characterized by randomized 
(X, Y)-only algorithms, with the exception that the new price- vector based demand 
function is used in the optimization of Theorem 1. Similarly, the dynamic control 
policy of this section is only changed by replacing the original demand function 
with the new demand function. 

However, the 2-price result of Theorem 2 would no longer apply in this setting. 
This is because Theorem 2 uses a strategy of independent pricing that only applies 
if the demand function for product k depends only on P k {t) and Y(t). In the case 
when demands arc affected by the full price vector, a modified analysis can show 
that <p opt can be achieved over stationary randomized algorithms that use at most 
min[^, M] + 1 price vectors (Z (i \P (i) ) for each demand state Y(t) £ y. 

5. ERGODIC MODELS 

In this section, we consider the performance of the Joint Purchasing and Pricing 
Algorithm (JPP) under a more general class of non-i.i.d. consumer demand state 
Y{t) and material supply state X{t) processes that possess the decaying memory 
property (defined below). In this case, the deterministic queue bounds in part (a) 
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of Theorem 3 still hold. This is because part (a) is a sample path statement, which 
holds under any arbitrary Y(t) and X(t) processes. Hence we only have to look at 
the profit performance. 

5.1 The Decaying Memory Property 

In this case, we first assume that the stochastic processes Y(i) and X(i) both have 
well defined time averages. Specifically, we assume that: 

1 t_1 

lim - V l{Y(t) = y} = ir(y) with probability 1 (43) 

t^oc t z — ' 

T = 

1 

lim - V l{X(t) = x} = tt(x) with probability 1, (44) 

t^oo t Z — ' 

T = 

where w(y) and ir(x) are the same as in the i.i.d. case for all x and y. Now we 
consider implementing the (X, F)-only policy in Corollary 1. Because this policy 
makes decisions every slot purely as a function of the current states X(t) and Y(t), 
and because the limiting fractions of time of being in states x, y are the same as in 
the i.i.d. case, we see that Corollary 1 still holds if we take the limit as t goes to 
infinity, i.e.: 



t-i 

4°* = lim -^E{f(r)} (45) 

t— >co t '—^ 

T = 

t-1 (- K } 

0= 1mtE E A *m( T ) - E PrnkZ* k (r)F fe (P fe * (r), Y{t)) Vm (46) 

°° r=0 I fe=l J 

where E {<£»}, E{^(t)} and E {^2=1 0mkZ* k (r)F k (P^(r), Y(t))} are defined 
as in Corollary 1, with expectations taken over the distributions of X(t) and Y{t) 
at time t and the possible randomness of the policy. 

We now define H(t) to be the system history up to time slot t as follows: 

H{t) 4 {X(T),Y(r)y-J U {[Q m (r)]^ =1 }* = o- (47) 

We say that the state processes X(t) and Y(t) have the decaying memory property 
if for any small e > 0, there exists a positive integer T = T e , i.e., T is a function of 
e, such that for any to G {0,1,2,...} and any H(to), the following holds under the 
(X, T)-only policy in Corollary 1: 



to+T-l 

' < e. (48) 



1 to+T-l 

E E{0*(r)|ff(t o )} 



T 

r=to 



to+T-1 r if 
- E E \ A *m( T ) -^2PmkZk(T)F k (P^(T),Y(T)) | i?(t ) 



T 

T = t k fe=l 



< e (49) 



It is easy to see that if X(i) and Y(t) both evolve according to a finite state ergodic 
Markov chain, then the above will be satisfied. If X(t) and Y(t) arc i.i.d. over 
slots, then T € = 1 for all e > 0. 
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5.2 Performance of JPP under the Ergodic Model 

We now present the performance result of JPP under this decaying memory prop- 
erty. 

Theorem 4. Suppose the Joint Purchasing and Pricing Algorithm (JPP) is 
implemented, with 6 m satisfying condition (32), and that \i m ,max < Qm(0) < 
6 m + A m ^ max for all m G {1, 2, M}. Then the queue backlog values Q m (t) for all 
m G {1, . . . , M} satisfy part (a) of Theorem 3. Further, for any e > and T such 
that (48) and (49) hold, we have that: 

M) > <P" " ^ " e f 1 + E max[flm ^"-""" ] ) , (50) 

r=0 ' \ m=l / 

w/iere is defined in (23). 

To understand this result, note that the coefficient multiplying the e term in the 
right hand side of (50) is 0(\) (recall that 9 m in (32) is linear in V). Thus, for a 
given e > 0, the final term is O(e). Let T = T e represent the required mixing time 
to achieve (48)-(49) for the given e, and choose V = T e /e. Then by (50), we are 
within eB + O(e) = 0(e) of the optimal time average profit <fi opt , with buffer size 
0(V) = 0(Tje). For i.i.d. processes X(t), Y(t), wc have T £ = 1 for all e > 0, and 
so the buffer size is 0(l/e). For processes X(t), Y(t) that are modulated by finite 
state ergodic Markov chains, it can be shown that T e = 0(log(l/e)), and so the 
buffer size requirement is 0((l/e) log(l/e)) [Urgaonkar and Neely 2009] [Huang and 
Neely 2007]. 

To prove the theorem, it is useful to define the following notions. Using the same 
Lyapunov function in (20) and a positive integer T, we define the T-slot Lyapunov 
drift as follows: 

A T (H(t)) ^ E {L(Q(t + T)) - L(Q{t)) \ H(t)} , (51) 

where H(t) is defined in (47) as the past history up to time t. It is also useful to 
define the following notion: 

A r (t) = E {L(Q(t + T)) - L(Q{t)) \ H T (t)} , (52) 

where: 

H T (t) = {H(t)} U {X(t),Y(t), X(t + T — 1), Y(t + T — 1)} U {Q(t)} (53) 

The value Hrit) represents all history H(t), and additionally includes the sequence 
of realizations of the supply and demand states in the interval {t, t+1, ...,t + T— 1}. 
It also includes the backlog vector Q(t) (this is already included in H(t), but 
we explicitly include it again in (53) for convenience). Given these values, the 
expectation in (52) is with respect to the random demand outcomes Dk(t) and the 
possibly randomized control actions. We have the following lemma: 

Lemma 4. Suppose the JPP algorithm is implemented with the 9 m values satis- 
fying (32) and ^ m . max < Q m (0) < 8 m + A m ^ max for all m £ {1, 2, M}. Then for 



21 



any to G {0, 1,2, ...}, any integers T, any Hxito), and any Q(to) value, we have: 

to+T-l to+T-l 

A T (t )-V E ^{<t>(r)\H T (to)}<T 2 B-V E E [^{t) \ H T (t )} (54) 

T=to T=to 

M to+T-l C if ~j 

+ E (o™(*o) - °m) E E - E^^( T )^( p fe w. y W) I ^T(to) , 

m=l t=*o I fe=l / 

wii/i £? defined in (23) and <fi* (t) , A^(t) , Z^{t) and P£{t) are variables generated 
by any other policy that can be implemented over the T slot interval (including those 
that know the future X(t), Y(t) states in this interval). 

Proof. See Appendix C. □ 

We now prove Theorem 4. 

PROOF. (Theorem 4) Fix any to > 0. Taking expectations on both sides of (54) 
(conditioning on the information H(to) that is already included in Hx(to)) yields: 

to+T-l to+T-l 

A T (H(to))-V E ^Wr)\H(t )}<T 2 B-V E ® (t) \ H (t Q )} 

T=t T = t 

M to+T-l C if ~j 

+ £(Qm(to)-0 m ) E E \A* m (r)-J2l3rnkZ* k (r)F k (P^r),Y(r))\H(t )[. 

m=l T=t I fe=l ) 

Now plugging in the policy in Corollary 1 and using the e and T that yield (48) 
and (49) in the above, and using the fact that \Q m (to) — 6m\ < max[# m , A mimax ], 
we have: 

to+T-l 

A T (H(t )) - V E E WW I #(*«>)} < 

r=to 
M 

T 2 B - VT0 opt + VTe + T E niax[0 m , A ro , max ]e. (55) 

m— 1 

We can now take expectations of (55) over H(to) to obtain: 

to+T-l 

E{L(Q(t + T))-L(Q(t ))}-V E E W>M>< 

T = t 

M 

T 2 B - VT^ + VTe + T E max[0 m , fi m , max ]e. 

m— 1 

Summing the above over to = 0,T, 2T, (J — 1)T for some positive J and dividing 
both sides by VTJ, we get: 

T = 

^-^ + e+E ma " [ ^ TO ' ro0x]£ - (56) 

m— 1 
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By rearranging terms and using the fact that L(t) > for all t, we obtain: 
1 VfW U > Afivt TB (,.\- m^[e m ,A m , max } \ E{T(Q(0))| 

r=0 \ m=l / 

Taking the liminf as J — > oo, we have: 

lim rnf \ g E {^(r)} > ^ - ^ - e (l + £ max f^w»a,] ^ . (57) 

t=0 \ m=l ' / 

This completes the proof of Theorem 4. □ 

6. ARBITRARY SUPPLY AND DEMAND PROCESSES 

In this section, we further relax our assumption about the supply and demand 
processes to allow for arbitrary X(t) and Y{t) processes, and we look at the perfor- 
mance of the JPP algorithm. Note that in this case, the notion of "optimal" time 
average profit may no longer be applicable. Thus, we instead compare the per- 
formance of JPP with the optimal value that one can achieve over a time interval 
of length T. This optimal value is defined to be the supremum of the achievable 
average profit over any policy, including those which know the entire realizations of 
X(t) and Y{t) over the T slots at the very beginning of the interval. We will call 
such a policy a T-slot Lookahead policy in the following. We will show that in this 
case, JPP's performance is close to that under an optimal T-slot Lookahead policy. 
This T-slot lookahead metric is similar to the one used in [Neely 2009][Neely 2010], 
with the exception that here we compare to policies that know only the X(t) and 
Y(t) realizations and not the demand Dk{t) realizations. 

6.1 The T-slot Lookahead Performance 

Let T be a positive integer and let to > 0. Define </>t(£o) as the optimal expected 
profit achievable over the interval {to, to + 1, to + T — 1} by any policy that has 
the complete knowledge of the entire X(t) and Y(t) processes over this interval 
and that ensures that the quantity of the raw materials purchased over the interval 
is equal to the expected amount consumed. Note here that although the future 
X(t), Y(t) values are assumed to be known, the random demands Dk{t) are still 
unknown. Mathematically, (firito) can be defined as the solution to the following 
optimization problem: 

(PI) 



t a +T-l 



max 



<M*o)= E e{#t) |#t(*o)} (58) 

T=to 

to+T-1 ( K \ 

s.t. J2 E \ A ™( T ) -J2PmkZ k {T)F k (P k (T),Y(T)) | H T (t ) \ =0, Vm (59) 

T = t { k = l J 

Constraints (2), (8), (9), (10). (60) 

Here Hrito) is defined in (53) and includes the sequence of realizations of X(t) and 
Y(t) during the interval {to, t +T— 1}; 0(r) is defined in Equation (10) as the in- 
stantaneous profit obtained at time r; A m (r) and J2k=i PmkZk(T)Fh(Pk(r),Y(T)) 
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are the number of newly ordered parts and the expected number of consumed parts 
in time r, respectively; and the expectation is taken over the randomness in Dk{r), 
due to the fact that the demand at time r is a random variable. We note that in 
Constraint (59), we actually do not require that in every intermediate step the raw 
material queues have enough for production. This means that a T-slot Lookahead 
policy can make products out of its future raw materials, provided that they are 
purchased before the interval ends. Since purchasing no materials and selling no 
products over the entire interval is a valid policy, we see that the value <j>r{to) > 
for all to and all T. 

In the following, we will look at the performance of JPP over the interval from 
to JT—1, which is divided into a total of J frames with length T each. We show that 
for any J > 0, the JPP algorithm yields an average profit over {0, 1, JT — 1} that 
is close to the profit obtained with an optimal T-slot Lookahead policy implemented 
on each T-slot frame. 

6.2 Performance of JPP under arbitrary supply and demand 

The following theorem summarizes the results. 

Theorem 5. Suppose the Joint Purchasing and Pricing Algorithm (JPP) is im- 
plemented, with 6 m satisfying condition (32) and that [i m , m ax < Q m (0) < m + 
A- m ,max for all to G { 1 , 2, . . . , M} . Then for any arbitrary X{t) and Y(t) pro- 
cesses, the queue backlog values satisfies part (a) of Theorem 3. Moreover, for any 
positive integers J and T, and any Hjt(0) (which specifies the initial queue vec- 
tor Q(0) according to the above bounds, and specifies all X(t), Y(t) values for 
t G {0, 1, ... , JT — 1} ), the time average profit over the interval {0, 1, JT — 1} 
satisfies: 

T=0 j=0 

where <Pt{jT) is defined to be the optimal value of the problem (PI ) over the interval 
{jT, (j + 1)T - 1}. The constant B is defined in (23). 

PROOF. (Theorem 5) Fix any t > 0. We denote the optimal solution to the 
problem (PI) over the interval {to, t + 1, t + T — 1} as: 

{r (r) , A* m (r) , Zl (r) , P* k (r) }^\"% +T _ 1 . 

Now recall (54) as follows: 

to+T-l to+T-l 

A T (t )-V ^{<t>(r)\H T (h))<T 2 B-V ^ E {^(r) | H T (t )} (61) 

T=to T=to 

M to+T-l ( K \ 

+ E (<W*o) - M E E A *m^) -J2f3 mk Z* k (T)F k (PZ(T),Y(T)) | H T (to) , 

m=l T=t K fe=l ) 

Now note that the actions </>*(r), A* m {r) Z%(t) and P^(t) satisfy (58)- (59) and so: 

to+T-l 

& T (to)-V E ^{^{r)\H T (h))<T 2 B-VMh)- (62) 

T = to 
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Now note that since JPP makes actions based only on the current queue backlog 
and X(t), Y(t) states, we have: 



A r (to) = E{L(Q(to + T))-L(Q(to))\Hj T (0),Q(to)} 

to+T-1 to+T-1 

£ e{#t)|£t(*o)} - J2 ^{<P(T)\H JT (0),Q(t )} 



That is, conditioning on the additional X(t), Y(t) states for r outside of the T-slot 
interval does not change the expectations. Using these in (62) yields: 

to+T-1 

E{L(Q(t + T))-L(Q(to))\HjT(0),Q(t )}-V £ E {c3(r)| H JT (0), Q(t )} < 

T=to 

T 2 B-V<fr(to). 

Taking expectations of the above with respect to the random Q(t ) states (given 
Hjt(O)) yields: 



to+T-1 

E{L(Q(to + T))-L(Q(to))\HjT(0)}-V ]T E {</>(t)|# jt (0)} < 

T=t a 

2, 



j i — i 

- {L(Q(JT)) - L(Q(0)) | +/./t(0)} -y^E {</>(r) | ff JT (0)} 



TfB - V£r(t ). 
Letting t = jT and summing over j — 0, 1, ... J — 1 yields: 

JT-l 

E • . ^ 

T = 

,7-1 

< JT 2 B-Vj2MjT). 

3=0 

Rearranging the terms, dividing both sides by V JT and using the fact that L{t) > 
for all t, we get: 

1 , . ,1 £J ST e{l(Q(0))\H jt (0)\ 

T=0 j=0 

Because Q(0) is included in the Hjt(0) information, we haveE |l(Q(0))|^jt(0)| = 
L(Q(0)). This proves Theorem 5. □ 

7. CONCLUSIONS 

We have developed a dynamic pricing and purchasing strategy that achieves time 
average profit that is arbitrarily close to optimal, with a corresponding tradeoff in 
the maximum buffer size required for the raw material queues. When the supply 
and demand states X(t) and Y(t) are i.i.d. over slots, we showed that the profit is 
within 0(1/V) of optimality, with a worst-case buffer requirement of 0(V), where 
V is a parameter that can be chosen as desired to affect the tradeoff. Similar 
performance was shown for ergodic X(t) and Y{t) processes with a mild decaying 



25 



memory property, where the deviation from optimality also depends on a "mixing 
time" parameter. Finally, we showed that the same algorithm provides efficient 
performance for arbitrary (possibly non-ergodic) X(t) and Y(t) processes. In this 
case, efficiency is measured against an ideal T-slot lookahead policy with knowledge 
of the future X{t) and Y(t) values up to T slots. 

Our Joint Purchasing and Pricing (JPP) algorithm reacts to the observed system 
state on every slot, and does not require knowledge of the probabilities associated 
with future states. Our analysis technique is based on Lyapunov optimization, 
and uses a Lyapunov function that ensures enough inventory is available to take 
advantage of emerging favorable demand states. This analysis approach can be 
applied to very large systems, without the curse of dimensionality issues seen by 
other approaches such as dynamic programming. 

Appendix A — Proof of Necessity for Theorem 1 

Proof. (Necessity portion of Theorem 1) For simplicity, we assume the system is 
initially empty. Because X(t) and Y(t) are stationary, we have Pr[X(t) = x] = w(x) 
and Pr[Y(t) = y] = ir(y) for all t and all x 6 X, y E y. Consider any algorithm 
that makes decisions for A(t), Z(t), P(t) over time, and also makes decisions for 
D{t) according to the scheduling constraints (5)-(6). Let <j) ac tuai(t) represent the 
actual instantaneous profit associated with this algorithm. 

Define 4> ac tuai as the limsup time average expectation of factual (t), and let {U} 
represent the subsequence of times over which the limsup is achieved, so that: 



lim j- ^ ^{<Pactual(T)} = <t> actual (63) 
,-►00 ti ^ 

Let c(t),r(t), a m (t),~p m (t) represent the following time averages up to slot t: 

c(t) 4 ^E{c(A(r),I(r))} (64) 



T = 

t-1 K 



T ^ = ^EE^wa^xw-"*)} (65) 

T = 0fc=l 

t-1 

M*) = tE e { a ™( t )} ( 66 ) 

t-1 K 

7U*) = -EE^ E {^( r )^( r )} ( 67 ) 

T = 0fc=l 

Because the system is initially empty, we cannot use more raw materials of type m 
up to time t than we have purchased, and hence: 

a m (t) >~p m {t) for alii and all m G {I,...,M} (68) 

Further, note that the sum profit up to time t is given by: 

I t_1 

~Y / E { ( t>«ctuai(T)} = -c(t)+r(t) (69) 



T = 
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We now have the following claim, proven at the end of this section. 

Claim 1: For each slot t and each x e X, y E y, a e A(x), fee {1, . . . , K}, there 
are functions 9(a,x,t), v k (y,t), and d k (y,t) such that: 

c(i) = ?r(x) 0(a, a, i)c(a, a) (70) 

xeX aeA(x) 

a m {t) = ^ tt(x) 0(a,x,t)a m (71) 

K 

r(t) = 2^)E^(»'*) ( 72 ) 
yey fe=l 

if 

ye^ fe=l 

and such that: 

< 6(a, x, t) < 1 , 6(a,x,t) = 1 Vx e A", a e .A(x) (74) 

aeA(x) 

and such that for each y G y, k G {1, . . . , K}, the vector {v k {y, t); d k (y, 1)) is in the 
convex hull of the following two-dimensional compact set: 

n k (y)±{(v,fi)\v={ P -a k )zF k (p,y), fi = zF k { Pl y) , p e V , z e {0,1}}. (75) 

The values [6(a, x, t); {v k {y, t);d k (y, t))] xe x, y ey,aeA{x) of Claim 1 can be viewed 
as a finite or countably infinite dimensional vector sequence indexed by time t 
that is contained in the compact set defined by (74) and the convex hull of (75). 
Hence, by a classical diagonalization procedure, every infinite sequence contains a 
convergent subsequence that is contained in the set [Billingsley 1986]. 

Consider the infinite sequence of times U (for which (63) holds) , and let U repre- 
sent the infinite subsequence for which 9(a,x,ti), v k (y,ti), and d k (y,ti) converge. 
Let 8(a,x), v k {y) 1 and d k {y) represent the limiting values. Define c, a m , f, and fi m 
as the corresponding limiting values of (70)-(73). 



c = 



r = 



xeX aeA(x) 

Y ir(x) Y 9(a,x)a m 

xeX aeA(x) 
K 

E 7r (tf)E i/ fc(y) 

yey fe=i 

K 

^2^(y)^2(3,nkd k (y) 

yey fe=i 



Further, the limiting values of 0(a,x) retain the properties (74) and hence can be 
viewed as probabilities. Furthermore, taking limits as U — > oo in (68) and (69) 
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yields: 

a m > fi m f° r all to G {1, ... , M} 

factual = —C + f 

Finally, note that for each k G {1, ... , K} and each y G y, the vector {vk{y) , d k {y)) 
is in the convex hull of the set (75), and hence can be achieved by an (X, F)-only 
policy that chooses Z(t) and P(t) as a random function of the observed value of 
Y(t), such that Z k (t) G {0, 1} and P k (t) G V k for all fc, and: 

i/ fc (y) = E{(i\(t) -a k )Z k (t)F k (P k (t),y) \ Y(t) = y} 
d k (y) = E{Z k (t)F k (P k (t),y)\Y(t)=y} 

It follows that <t> ac tuai ls an achievable value of <\> for which there are appropriate 
auxiliary variables that satisfy the constraints of the optimization problem of The- 
orem 1. However, cj) opt is defined as the supremum over all such <j> values, and hence 
we must have ~$ actual < (j) opt . □ 

It remains only to prove Claim 1. 

Proof. (Claim 1) We can re-write the expression for c(t) in (64) as follows: 
^) = \ E E E c ^ x)^x)Pt[A{t) =a\ X(t) = x] 

T=0xEX a 

= ^2^(x)^2e(a,x,t)c(a,x) 

xEX a 

where 6(a,x,t) is defined: 

t-i 

9(a, x, t)=- Pr[A{r) = a \ X(r) = x] 

and satisfies: 

< 6(a, x, t) < 1 , ^2 e ( a ' x,t) = l Vi, x G X 

a 

This proves (70). Likewise, we can rewrite the expression for a m (t) in (66) as 
follows: 

(t) — ^ n{x) S(a, x, t)a m 

x£X a 

This proves (71). 

To prove (72)- (73), note that we can rewrite the expression for f(t) in (65) as 
follows: 

1 t-1 K 

= tEEE^I^w-^m^wi^)^} ( 76 ) 

T=o y ey k=i 

Now for all y G y, all vectors z G {0,1} K , p G V K , and all slots t, define 
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■y k (y,z,p,t) as follows: 

lk {y, z,p, t)A ik^TO^W^W^T lf the denominator is non-zero 
otherwise 

Note that D k (t) < D k (t), and so < jk(y, z,p, t) < 1. It follows by definition that: 

E{a*D fc (t)| F(t) =y,Z(t) = z,P(i)=p} = 7fc (y,z,p,i)F fc (f> fe ,y) 

Using the above equality with iterated expectations in (76) yields: 

K 

F W = n (v) x 

yeyk=i 
1 t_1 

-^E{(P fe (r) - a k hk(y,Z(T),P(T),T)F k (P k (T),y) \ Y(r) = y} (77) 
With similar analysis, we can rewrite the expression for ~fi m {t) in (67) as follows: 

K 

Mm(*) = 51 X^ mfe7r( ^) X 
yG^ fe=l 
t-1 

- ^ E { 7fe (y, Z(r), P(r), r)F k (P k (r), y) | Y(t) = y} (78) 

r r=0 

Now define f k (y,t) and d k (y,t) as the corresponding time average expectations 
inside the summation terms of (77) and (78), respectively, so that: 

K 

yey k=i 

K 

yG^ fc=i 

Note that the time average expectation over t slots used in the definitions of v k (y, t) 
and d k (y,t) can be viewed as an operator that produces a convex combination. 
Specifically, the two-dimensional vector (v k (y,t);d k (y,t)) can be viewed as an ele- 
ment of the convex hull of the following set Ct k (y): 

fl k (y)A{(u, d) \v = {p- a k )jF k (p, y) , d = jF k (p, y) , P eP, < 7 < 1} 

However, it is not difficult to show that the convex hull of the set Cl k (y) is the same 
as the convex hull of the set £l k {y) defined in (75). 6 This proves (72) and (73). □ 

Appendix B — Proof of the 2-Price Theorem (Theorem 2) 

PROOF. (Theorem 2) The proof follows the work of [Huang and Neely 2007]. 
For each product k 6 {!,•■• , K} and each possible demand state y E y, define 



6 This can be seen by noting that flk(y) C &k(y) C Conv(£lk(y)) and then taking convex hulls of 
this inclusion relation. 
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constants fh(y) and d k (y) as follows: 

f fc (y) 4 E{Z fc (t)(i\(t) -a fe )^(P fe (i),2/) I ^(*) = J/} 
My) 4 E{Z fe (f)F fe (P fe (t),y) | = y} 

where Z(i) = (Zi(t), . . . , and P(t) = (Pi(i), . . . , Prt(*)) are the stationary 

randomized decisions given in the statement Theorem 2. Thus, by (14) and (15): 

Ef=iE vey T(»)rfc(l/)>r (79) 
EfcLi /3mfcE y e3; 7r (y) d mfc(2/) < Am for all m G {1, ... , M} (80) 

Now consider a particular fc, y, and define the two-dimensional set Cl(k, y) as follows: 

Q(k, y) = {(r; d)\r = z(p- a k )F k (p, y), d = zF k (p, y), P eV,ze {0, 1}} 

We now use the fact that if C is any general random vector that takes values in a 
general set C, then E{C} is in the convex hull of C [Georgiadis ct al. 2006]. Note 
that for any random choice of Z k (t) G {0, 1}, P k (t) eft, we have: 

(Z k (t) (P k (t) -a k )F k (P k (t) ,y)-Z k (t)F k (P k {t),y)) G Q(k,y) 

Hence, the conditional expectation of this random vector given Y(t) = y, given by 
(r k (y);d k {y)), is in the convex hull of fl(k, y). Because fl(k, y) is a two dimensional 
set, any element of its convex hull can be expressed as a convex combination that 
uses at most three elements of Q(k,y) (by Caratheodory's Theorem [Bertsekas 
et al. 2003]). Moreover, because the set V is compact and F k (p,y) is a continuous 
function of p G V for each y G y, the set Q,{k, y) is compact and hence any point 
on the boundary of its convex hull can be described by a convex combination of 
at most two elements of Cl(k,y) (see, for example, [Huang and Neely 2007]). Let 
(f k (y),d k (y)) be the boundary point with the largest value of the first entry given 
the second entry is d k (y). We thus have f k (y) > f k (y), and writing the convex 
combination with two elements we have: 

(r* k (y);d k (y)) = r/ 1 ) (zWfrW - a k )F k (p^\ y); z^F k (p^\ y)) 

+„(2) ^(2 )(p (2) -a fc )F fc (p (2) ,»);^ 2) ^(p (2) ,y)) 

for some set of decisions (z^\p^) and (z^ 2 \p^) (with zW g {0, g p) and 

probabilities t/ 1 ) and ?/ 2 ) such that r/ 1 ) + r/ 2 ' = 1. Note that these zW,pW,r?W 
values are determined for a particular (k,y), and hence we can relabel them as 
z i k ) (y),p i k t) (y), and ^(y) for i e {1,2}. 

Now define the following stationary randomized policy: For each product k G 
{1, ...,K},ii Y(t) = y, independently choose Z* k {t) = z { k ] \y) and P fc * (t) = p^(y) 

with probability rf^(y), and else choose Z k (t) = z k 2 \y) and P£(t) = p k 2 \y). It 
follows that for a given value of y, this policy uses at most two different prices for 
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each product. 7 Further, we have: 

f k (y) < E{Z* k (t)(Pk(t) - a k )F k {P* k {t) lV ) | Y(t) = y} 
d k (y) = E{Z* k (t)F k (P:(t),y) \ Y(t) = y} 

Summing these conditional expectations over k € {1, . . . , K} and y 6 y and using 
(79)-(80) yields the result. □ 

Appendix C — Proof of Lemma 4 

Proof. (Lemma 4) Using the queueing dynamic equation (19) (which holds 
because Q m (t) > fJ, m ,max for all i), it is easy to show: 

\{Q m (r + 1) - e m f - l -{Q m {r) - 9 m f 

= ^(A m (t) - ii m {t)) 2 + (Q m (r) - 9 m )[A m { T ) - Umir)] , 

Now summing over m G {1, ... , M} and adding to both sides the term — V(P(t), wc 
have: 

M 

A 1 (r)-V(f ) (T)<B-V ( f ) (T) + J2(Qm(T)-e m )[A m (T)- f , m (T)]. 

rn — 1 

where B is defined in (23), and K\{t)=\ £™=i [(Q m (T+l)-8 m ) 2 - (Q m (r)-0 m ) 2 ] 
is the 1-step sample path drift of the Lyapunov function at time r. Now for any 
to < t < to + T — 1, we can take expectations over the above equation conditioning 
on i?-r(io) to get: 

E {Ai(T)|flr(*o)} - VE [4>{t) I H T (t )} <B-VE {cf>(r) | H T (t )} (81) 

M 

+ E {(Qm(r) - m )[A m (r) - Mm (r)] | H T (t )} . 

m—l 

However, using iterated expectations in the last term as in (25), we see that: 

M 

(r)] | H T (to)} 

m—l 

M 

= J2 E {(Qm(r) -e m )E{A m ( T ) -/i ro (r) | Q(r), P(r), Z(r), H T (t )} \ H T (to)} 

m—l 

M ( K 

= Z Z E \ (Q"»( T ) - d rn)[A m (T) -Y,PrnkZ k {T)F k {P k {T),Y{T))] \ H T (t ) 

m=l I k=l 



7 Further, given the observed value of Y(t) = y, this policy makes pricing decisions independently 
for each product k. 
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Plugging this back into (81), we get: 

E{A 1 (r)|ff r (to)}-VE{^(T) H T (t Q )} < B - VE {^(r) | # T (*o)} (82) 

Mr K 
+ - ^m)[^m(r) - ^ L3 mk Z k (T)F k (P k ( T ), Y(t))] I flr(t ) I • 

m=l I fc=l J 

Now since, given the Q(t) values on slot r, the JPP algorithm minimizes the right 
hand side of the above equation at time t, we indeed have: 

E {Ai(r)|flr(to)} - VE {<£(t) | H T (t )} < B — VE |o5*(t) \ H T (t )} (83) 

M ( K \ 

+ J2 E \(Qm(r) - e m )[A* m (r) -J2^mkZ* k (r)F k (P^r),Y(r))] \ H T (t ) , 



m— 1 ^ k— 1 



where <j>* (r), A^(t) Z^{t) and P k (r) are variables generated by any other policies. 
Summing (83) from r = to to r = to + T — 1 , we thus have: 



to+T-l , . to+T-l 



A T (io)- yE {^( T ) I #t(*o)} <TS-F ^ e{^*(t) I H T (t )} (84) 

T=to T=to 

M to+T-l ( K \ 

+ E E E W™( T ) - ^)K(r) - 2^fc^(r)F fc (P fc *(r),r(T))] | H T (t ) . 

m=l r=t I fe=l J 

Now using the fact that for any t 1 \Q m {t ~t~ t) — Qm(0 ^ Tniax[j4 m ma j., I^m.max]: 
we get: 

M to+T-l if 

E E (Qm(r)-fl m )K(r)-2^ fc Z£(r)F fc (P fc *(r),y(r))] 

m=l T=to fe=l 

M to+T-l if 
^ B ' + E E (<3™(*0)-^)[^(T)-^^^(T)^(P fe *(T),F(T))] 
m— 1 T—tQ k—1 

where: 

B/4 T(T-1) ^ max [^ >roax ,^ ;roax ] = T(T - 1)5 

m— 1 

where i? is defined in (23). Plugging this into (84) and using the fact that condi- 
tioning on Hrito), Q(to) is a constant, we get: 

to+T-l to+T-l 



A T (t )-V ]T e{^(t) \H T (t )} <TB + B'-V ]T E {o5*(r) | H T (t )} 

r=to T=to 
M to+T-l r K 

+ J2(Qm(t )-e m ) ]T E ^(r)-^/3 mfc Z fc *(r)^(P fe *(r),y(r)) \ H T (t ) 



m—1 t —to ^ k—1 

Noting that TB + B' = T 2 B proves Lemma 4. □ 
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